DCU at the NTCIR-9 SpokenDoc Passage Retrieval Task

نویسندگان

  • Maria Eskevich
  • Gareth J. F. Jones
چکیده

We describe details of our runs and the results obtained for the “IR for Spoken Documents (SpokenDoc) Task” at NTCIR-9. The focus of our participation in this task was the investigation of the use of segmentation methods to divide the manual and ASR transcripts into topically coherent segments. The underlying assumption of this approach is that these segments will capture passages in the transcript relevant to the query. Our experiments investigate the use of two lexical coherence based segmentation algorithms (TextTiling, C99). These are run on the provided manual and ASR transcripts, and the ASR transcript with stop words removed. Evaluation of the results shows that TextTiling consistently performs better than C99 both in segmenting the data into retrieval units as evaluated using the centre located relevant information metric and in having higher content precision in each automatically created segment.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

DCU at the NTCIR-11 SpokenQuery&Doc Task

We describe DCU’s participation in the NTCIR-11 SpokenQuery&Document task. We participated in the spokenquery spoken content retrieval (SQ-SCR) subtask by using the slide group segments as basic indexing and retrieval units. Our approach integrates normalised prosodic features into a standard BM25 weighting function to increase weights for terms that are prominent in speech. Text queries and re...

متن کامل

Spoken Document Retrieval by Contents Complement and Keyword Expansion Using Subordinate Concept for NTCIR-SpokenDoc

We report on the result of investigating which relationship is important among hypernym and hyponym relationships in retrieval keyword expansion. Moreover, we report the effect of the keyword expansion and the contents complement for spoken document retrieval for SCR lecture retrieval task and SCR passage retrieval task. Spoken Document Retrieval by contents complement and keyword expansion usi...

متن کامل

Spoken Document Retrieval Experiments for SpokenDoc at Ryukoku University (RYSDT)

In this paper, we describe spoken document retrieval systems in Ryukoku University, which were participated in NTCIR-9 IR for Spoken Documents (“SpokenDoc”) task. In NTCIR-9 “SpokenDoc” task, there are two subtasks: “Spoken term detection (STD) subtask” and “Spoken document retrieval (SDR) subtask”. We participated in the both subtasks as team RYSDT. In this paper, first, our STD systems are de...

متن کامل

DTW-Distance-Ordered Spoken Term Detection and STD-based Spoken Content Retrieval: Experiments at NTCIR-10 SpokenDoc-2

In this paper, we report our experiments at NTCIR-10 SpokenDoc-2 task. We participated both the STD and SCR subtasks of SpokenDoc. For STD subtask, we applied novel indexing method, called metric subspace indexing, previously proposed by us. One of the distinctive advantages of the method was that it could output the detection results in increasing order of distance without using any predefined...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011